The Top Most Cited Books In Wikipedia
Obscure and usually expensive
Methods
One obvious application of databases such as DBpedia and Freebase is to use them as part of a bibliographic database, as Wikipedia topics could be used to classify books much the way that Library of Congress Subject Headings are used.
Freebase contains a property /book/written_work/subjects
that links books to subjects, but it isn't well populated. Wikipedia, however, contains millions of links to pages like
http://en.wikipedia.org/wiki/Special:BookSources/978-0-936389-27-1
where the number is an ISBN, which is an almost unique identifier for a book edition.
These links cannot be found in the Pagelinks dataset in DBpedia because the pagelinks extraction doesn't capture all links that are created by macros, such as the {cite} macro which is usually used to generate these links.
It would be possible to create something that parses Wikipedia markup to explicitly parse the site links, it's much simpler to parse the HTML, pretend that the semantic web is already here, and parse the ISBN numbers out of the links.
Results
When you look at the top cited documents in collections of papers or other articles, the results don't come as a surprise. The readers and writers of that literature overlap, so the top cited works tend to be things everybody knows. (With the caveat that scientists are a little surprised that review papers are so widely read,)
When we look at books cited in Wikipedia, however, the most prevalent are reference books that cover a large number of topics in a specialized area. Often these books are obscure, out of print, hard to find, and expensive. The scattershot nature reflects the fact that certain Wikipedia authors have been systematic in documenting their sources while others have not been.
The results here are fairly typical, however, in that every knowledge base seems to have a unique personality or "point of view", which is often a bit bent compared to the way most people think.
1 -- British Hit Singles And Albums
Cited by 4045 articles.
2 -- Stanovništvo: popis stanovništva, domaćinstva i stanova u 2002.
Cited by 3037 articles
We don't have a picture of this book, and it can't be found at Amazon.com, but it can be found on OpenLibrary. It appears to be a gazetteer that covers the Republic of Serbia.
3 -- California's Geographic Names: A Gazetteer of Historic and Modern Names of the State
Cited by 2922 articles
4 -- Mammal Species of the World: A Taxonomic and Geographic Reference
Cited by 2205 articles
5 -- Die Ritterkreuzträger [Hauptbd.]
Cited by 1855 articles
It took some effort to figure this one out, but this book is about Nazi recipients of the Knight's Cross awards in WWII;
OCLC link
OCLC Link to english edition
6 -- The Directory of Railway Stations: Details Every Public and Private Passenger Station, Halt, Platform and Stopping Place, Past and Present
Cited by 1804 articles
7 Wrestling Title Histories
Cited by 1635 articles
8 Die Trager des Ritterkreuzes des Eisernen Kreuzes, 1939-1945: Die Inhaber der hochsten Auszeichnung des Zweiten Weltkrieges aller Wehrmachtteile (German Edition)
Cited by 1576 articles
9 Encyclopaedia of Mathematics, 10 Volume Set
Cited by 1500 articles
10 Ships of the Royal Navy: A Complete Record of All Fighting Ships from the 15th Century to the Present
Cited by 1466 articles.
Creator of database animals and bayesian brains